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Abstract.—A new full-length genomic DNA, encoding a member of the cyanovirin-N (CV-N) 
homologous protein family, has been cloned from the fern species Ceratopteris thalictroides by 
chromosome walking. It is 1993 bp long, contains a 723 bp open reading frame (ORF) that encodes 
a deduced protein (named CtCVNH) with 150 amino acid residues. CtCVNH has a predicted 
isoelectric point (PI) of 4.47 and a calculated molecular mass 15.9556 kDa. It possesses the 
conserved anti-HIV (human immunodeficiency virus) CV-N domain, which is the same as the 
cyanovirin-N homology (CVNH) members that were isolated from filamentous ascomycetes and C. 
richardii. Modeling of the tertiary structure indicated that CtCVNH is an elongated, largely (3-sheet 
protein that displays internal two-fold pseudosymmetry. Comparative structure analysis of the 
predicted CtCVNH with native CV-N revealed that the major evolutionary changes occurring 
during the evolution of plant CVNHs were: 1) a length increase at N- and C-terminal regions; and 2) 
a loop to helix transition at the helical-turn regions. Phylogenetic analysis showed that CtCVNH 
was grouped together with the two K ’VNHs from C. richardii. 

Key Words. — Ceratopteris thalictroides , chromosome walking, single oligonucleotide nested PCR, 
inverse PCR, thermal asymmetric interlaced PCR, CVNH, bioinformatic analysis 
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Cyanovirin-N (CV-N) is an 11 kDa anti-HIV (human immunodeficiency 
virus) protein originally isolated from the extract of the cyanobacterium Nostoc 
ellipsosporum (Des.) Rabenh. (Boyd et al., 1997). It consists of a single chain 
with 101 amino acids, exhibits significant internal sequence duplication 
between residues 1-50 and 51-101, and contains two intramolecular disulfide 
bonds (Gustafson et aL, 1997). CV-N is largely comprised of J3-sheets with a 
two-fold pseudosymmetry (Bewley et aL, 1998). Its antiviral activity depends 
on the high-affinity binding to the HIV surface envelope glycoprotein, gpl20 
(Boyd et aL, 1997; Mori et al., 1997). CV-N can specifically interact with high 
mannose groups (Bolmstedt et al., 2001; Botos et al., 2002), thereby blocking 
the interaction between gp!20 and the receptor CD4 on target cells (O’Keefe et 
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al., 1997). Besides HIV strains (Boyd et al., 1997), CV-N is also able to 
inactivate simian immunodeficiency virus (SIV), Ebola virus (EBO), herpes 
simplex virus-1 (HSV-1), and hepatitis C virus as well (Barrientos et al., 2003; 
O’Keefe et al., 2003; Barrientos and Gronenborn, 2005; Helle et al., 2006). The 
potent inactivation of HIV plus unique biophysical properties make CV-N a 
candidate for a topical anti-HIV microbicide. The CV-N preclinical develop¬ 
ment is underway (Colleluoria et al., 2005). 

Recently, a family of CVNH (cyanovirin-N homology) has been identified. 
All CVNH proteins share a common fold that matches the one previously 
thought to be unique in CV-N (Percudani et al., 2005). Current research on 
CVNHs is mainly focused on structural information, antiviral activity, 
carbohydrate-binding specificities or structure-function relationships (Percu¬ 
dani et al., 2005; Koharudin et al., 2008). For example, solution structures of 
three CVNHs from Tuber borchii Vittad., Ceratopteris richardii Brongn., and 
Neurospora crassa Shear et Dodge have been determined (Koharudin et al., 

2008) and may be helpful in elucidating the roles that these proteins play in 
the organs and during evolution. 

CVNHs show a patchy organism distribution regarding the anti-HIV domain. 
They are present in organisms as diverse as cyanobacteria, filamentous 
ascomycetes and seedless plants (Percudani et al., 2005). However, among 
plants, CVNHs have only been identified in the fern C. richardii until now. To 
provide useful information for understanding the evolution of CVNHs and 
developing antiviral polypeptides, here we report the cloning and sequence 
analysis of the full-length CVNH genomic DNA in Ceratopteris thalictroides 
(L.) Brongn. together with an analysis of CVNHs phylogeny and modeling of 
the protein tertiary structure. 

Materials and Methods 

Plant materials.—Ceratopteris thalictroides was collected from Wuhan 

Botanical Garden, Chinese Academy of Sciences, Wuhan, Hubei, China. 

Young and healthy leaves were sampled, immediately frozen in liquid N 2 , and 
stored at -70 C until used. 

Genomic DNA extraction. —Total genomic DNA was extracted from fresh 
leaves following the modified CTAB protocols (Su et al., 1998). DNA 
concentration and purity were determined by measuring UV absorption using 

a Pharmacia 2000 UV/Visible spectrophotometer. DNA intactness was checked 
by l.U% agarose gel electrophoresis. 

Molecular cloning of the full-length genomic DNA.— Based on the C. 
richardii EST sequence (Accession No. BQ087187), specific primers were 
designed to amplify the internal region of CVNH in C. thalictroides. The 

forward primer CVNH-F was 5'-GTGGGCGTCTAGCGATTTCCTTT-3', and the 
reverse primer CVNH-R was 5'-ATCATCCGCTGCTTGCTTCTTCG-3'. The 

reaction mixture (20 pL) contained 50 ng template DNA, 40 pmol each primer, 

1 pmol each dNTP, 1.0 U Taq DNA polymerase and 1 X Taq polymerase 
buffer. PCR was ^performed usinu the following protocol: the template was 
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Table 1 . The primers used in chromosome walking. 


Primer name 


Primer sequence 


5'IPCR 
5'SON-1 
5'SON-2 
5'SON-3 
3'TAIL-l 
3'TAIL-2 
3’TAIL-3 


5' -GGTGATATTGCCCGTCGGTGCCTTT-3' 

5 '-ATCACTGTTGAGGCAATCTGCGGCT-3' 

5 '-GCTGCGATC AAGACGATGAGAAAAC-3' 
5 '-CCATCGCTTCTAGGAGTAAACAGAC-3' 

5' -GTGC AA AGGC ACCGACGGGCA AT AT - 3' 
5 '-GGGGTGTTGGATTTCTGTGGCT ATG-3' 

5 '-AAGCGAAGAAGCAAGCAGCGGATGA-3' 


denatured at 94°C for 5 min followed by 36 cycles of amplification (94 C for 
50 s, 61°C for 50 s, 72 C for 90 s) and a final extension of 10 min at 72 C. 

Based on the sequence obtained from the internal DNA region, two sets of 
nested primers for 5' single oligonucleotide nested PCR (SON-PCR) (Antal et 
al., 2004) and 3' inverse PCR (IPCR) (Triglia et al., 1988) combined thermal 
asymmetric interlaced PCR (TAIL-PCR) (Liu and Whittier, 1995) were 
designed to amplify the 5' and 3' flanking sequences. These primers included 
5'IPCR, 5'SON-l, 5'SON-2, 5'SON-3, 3'TAIL-l, 3'TAIL-2, and 3TAIL-3 
(Table 1, Fig. 1). They were of high annealing temperatures and synthesized 
by Invitrogen (Shanghai). 

The 5' flanking sequence was amplified by SON-PCR. The primary PCR was 
carried out in a 20 pi volume containing 50 ng genomic DNA, 50 pmol single 
primer (5'SON-l), 50 mol/L each dNTP, 2.0 U Taq DNA polymerase and 1 X 
Taq polymerase buffer. For the secondary PCR, two single primers (5'SON-2 
and 5'SON-3) were separately used. The reaction solution was the same as that 
of primary PCR except that 1 pi of a 1:50 dilution of the primary PCR products 
was used as the template. 

The 3' flanking sequence was obtained using IPCR combined TAIL-PCR. 
Ceratopteris thalictroides genomic DNA was digested with Pac I (NEB, BSA 
5 U pg _t of DNA) at 37°C for 3 h, and then heated at 65°C for 20 min. The 
digested DNA was self-ligated overnight at 15 C with a concentration of 0.3— 
0.5 jig/ml in the presence of 3 U/ml T4 DNA ligase (Promega). PCR was carried 
out in a 20 pi volume with 1 pi ligated product, 1 pniol each dNTP, 40 pmol 
each primer (5'IPCR and 3'TAIL-l), 1.0 U Taq DNA polymerase and 1 X Taq 
polymerase buffer. The primary PCR of TAIL-PCR was performed using primer 


3TAIL-1 3TAIL-2 3TA1L-3 

5' __ __ __ 3’ 



5’SON-3 5’SON-2 5’SON-l 5’IPCR 

Fig. 1 . Schematic view of position and orientation of nested primers used in this study and of 
their relative positions to the amplified sequence of the specific PCR. The rectangle frame indicates 
the sequence obtained by specific PCR, whereas the line represents regions determined by further 
chromosome walking. 
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Table 2. Cycling conditions used for SON-PCR, IPCR, and TAIL-PCR. 


Name 

Reaction 

('ycle no. 

Thermal condition 

5'SON-PCR 

Primary 

■F 

1 

94 [ 1 (5 min) 



5 

94 C (30 s), 65 C (1 rain), 72 C (2.5 min) 



1 

94 C(30 s), 29 C(3 min), ramping to 72 C over 3 min. 




72 C(2.5 min) 



60 

94 C(30 s). 65 C(1 min). 72 C(2.5 min) 



1 

72 C(7 min) 


Secondary 

1 

94"C(5 min) 



30 

94'C(30 s), 65 C(1 min), 72°C(2.5 min) 



1 

72°C(7 min) 

3'IPCR 

IPCR 

1 

94°C(10 min) 



33 

94 C(1 min , 65 C(1 min), 72 \ 2 min) 



1 

72°C( 10 min) 

3'TAIL-PCR 

Primary 

1 

93 C l min), 95 c Cl 1 min) 



5 

94 C(30 s), 65 C(1 min), 72 C(2.5 min) 



1 

94 C(30 s), 25 C{3 min), ramping to 72 C over 3 min. 




72°C(2,5 min) 



15 

94 C(30 s), 65°C(1 min), 72 C C(2.5 min) 

94 C(30 s), 65 C(1 min), 72 C(2.5 min) 

94 C(30 s), 44 C(l min), 72 C(2.5 min) 



1 

72 C(5 min) 


Secondary 

1 

94^(1 min) 



15 

94 C(30 s), 65 C 1 min), 72°C(2.5 min) 

94°C(30 s), 65 C(1 min), 72°C(2.5 min) 

94 C(30 si, 44'C(l min), 72 C(2.5 min) 



1 

72°C(5 min) 


3'TAIL-1 as the gene-specific primer and primer AD (5'-TC(G/C)TICGNA- 
CIT(A/T)GGA-3') (Liu and Whittier, 1995) as the arbitrary degenerate primer in 
a total 20 pi volume that contained 1 pi of a 1:50 dilution of the IPCR products, 
2 pmol each dNTP, 40 pmol primer 3'TAIL-1, 500 pmol primer AD, 2.0 U Taq 
DNA polymerase and 1 X Taq polymerase buffer. For the secondary PCR, two 
gene-specific primers (3'TAIL-2 and 3'TAIL-3) were separately used with the 
same arbitrary primer as used in the primary one. The reaction solution was 
the same as that used for the primary PCR except that 1 pi of a 1:50 dilution of 
the primary PCR products was used as the template. Thermocycling profiles 
used or SON-PCR, IPCR, and TAIL-PCR are listed in Table 2. 

Recove, ly of PCR products. —PCR products were purified by running them 
through a 1.0% low melting agarose gel. The desired DNA band was cut out 
and recovered using the DNA rapid purification kit (Omega). 

DNA cloning and sequencing.—A purified PCR product was ligated into a 

pMD 19-T (TaKaRa) vector and then used to transform competent Escherichia 

coli cells DH-5a. A positive clone was identified by blue/white selection and 

ascertained by PCR. Purified plasmid DNA was sequenced in both directions 

by standard methods on an ABI 3730 automated sequencer at Invitrogen 

(Shanghai). Primers M13F and M13R located on pMDl9-T vector were utilized 
for sequence determination. 
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In silico analysis and molecular modeling. —ORF finder was used to predict 
coding sequence, and promoter analysis was performed online (http://www. 
fruitfly.org/cgi-bin/seq_tools/promoter.pl). Sequence analysis was conducted 
using the BLAST program (Altschul et al., 1997) and other programs available 
at the ExPASy server (Gasteiger et al., 2003). Multiple sequence alignment was 
carried out using the ClustalX software (Thompson et al., 1997). Figures of 
multiple sequence alignment adorned with secondary structure elements were 
generated with ESPript (Gouet et al., 1999). Primary structure analysis of the 
deduced CtCVNH (CVNH protein from C. thalictroides ) was conducted with 
ProtParam (Gasteiger et al., 2005) by using the ExPASy server online (http:// 
www.expasy.ch/tools/protparam.html). Secondary structure was predicted 
with SOPMA program (Geourjon and Deleage, 1995) online (http://npsa-pbil. 
ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_sopma.html). Phylogenetic anal¬ 
ysis was carried out using programs from the PHYLIP package; genetic 
distances were estimated with PROTDIST using the Jones-Taylor-Thornton 
model of amino acid substitutions. Neighbor-joining trees (Saitou and Nei, 
1987) were constructed using the NEIGHBOR program; 1000 random 
replications were utilized for bootstrap analysis, which was performed with 
the SEQBOOT and CONSENSE programs. Phylogenetic trees were rendered 
with the TREEVIEW program (Page, 1996). The three-dimensional (3D) 
structural models of CtCVNH were built by the homology-based method using 
the SWISS-MODEL program (Guex and Peitsch, 1997; Schwede et al., 2003; 
Arnold et al., 2006). The template used for modeling was C. richardii CVNH 
(PDB code 2jzjA) (Koharudin et al., 2008). Models were displayed with the 
I’yMol program (Delano, 2002). 


Results and Discussion 

Molecular cloning of the full-length genomic DNA. —Using a pair of specific 
primers (CVNH-F and CVNH-R), a single fragment of 775 bp was amplified 
from the C. thalictroides DNA [Fig. 2(a)]. Compared with the C. richardii cDNA 
sequence (Accession No. BQ087187), the sequence from C. thalictroides has 
two additional fragments that do not exist in C. richardii cDNA and the 
remaining parts of the sequence are identical to the C. richardii cDNA (Fig. 3). 
The CtCVNH intron—exon boundaries were thus deduced; it is composed of 
three exons and two introns. Based on the amplified sequence of the specific 
PCR, two sets of nested primers were further designed to obtain the 5' and 3' 
flanking sequences, respectively. A clear single band ~ 800 bp of the 5' 
flanking sequence was generated in the secondary reaction [Fig. 2(b)] using 
SON-PCR, while a *® 750 bp 3'flanking sequence was amplified through IPCR 
combined TAIL-PCR [Fig. 2(c)], 

Sequence analysis of the CtCVNH gene. —The cloned full-length CtCVNH 
gene is 1993 bp in length, including a 818 and 452 bp 5' and 3' untranslated 
region (UTR) respectively, and a 723 bp coding region. The 5'UTR has a TATA 
box in the predicted promoter elements. The ATG start codon, which is 
numbered +1 to +3, is flanked by G in both positions -3 (3 nucleotides before 
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Fig. 2. Agarose gel electrophoresis of the specific PCR(a), SON-PCR(b). and IPCR+TAIL-PCR(c) 
products. M is molecular weight marker (DL2000). a 1: about 750 bp fragment generated by specific 
PCR with primer CVNH-F and CVNH-R. b 1: smear bands produced by the first reaction of SON- 
PCR with primer 5'SON-l. 2: no clear band produced by the secondary reaction of SON-PCR with 
primer 5'SON-2. 3: a single band obtained by the secondary reaction of SON-PCR with 

primer 5'SON-3. c 1: the amplified DNA fragment of secondary reaction of TAIL-PCR with primer 
3'TAIL-2 and AD. 

the ATG codon) and +4 (1 nucleotide after the ATG codon), indicating that it is 
located in a sequence context for strong translational initiation (Kozak, 1999). 
The 3'UTR has a polyadenylation signal (AATAAA) and six ATTT domains 
(Fig. 4). These ATTT domains may be important for mRNA destabilization 
(Shaw and Kamen, 1986). The CtCVNH gene encodes a deduced protein of 150 
amino acid residues with a predicted isoelectric point (pi) of 4.47 and a 
calculated molecular mass of 15.9556 kDa. Regarding its amino acid 
composition, the most abundant is Ser (13.3% by frequency), followed by 
Gly (9.3%), Leu (9.3%), Ala (7.3%), Asn (7.3%), Asp (7.3%), and Val (6.7%). 
Acidic and basic amino acids constitute 10.0% and 5.3% of the protein, 
respectively. Moreover, 15.3% of the amino acids are charged, and the 

percentages of polar and hydrophobic amino acids are 64% and 25.3%, 
respectively (Table 3). 

With regard to the predicted secondary structure, the CtCVNH protein 
consists of 16.00% alpha helices, 28.67% extended strands, 12.67% (3 turns, 
and 42.67% random coils. The extended strands and random coils constitute 
the interlaced domain of the main part of the secondary structure. 


CtCVNH 



CrCVNH 


Fig. 3. Schematic view of the exon and intron positions deduced from C. richardii cDNA. The 
exon and intron are indicated by rectangle frame and line, respectively. 
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CCATCGCTTCT AGGACT AAACAGACAT ACTAT AACTGC -781 

ACTCTAACACATTTTCAACAATACAAGAAACATTTATACATACTTCACACGTGCACATCA -721 
TCACACATACACACATACTATGTGTGCATACATGCTCACATGCATAGTAGCATGCATATA -661 
CACATAGTAAAT AATGCACATGCATACATATATACACATGCACATTATTTCGGTGGGCCT -601 
TCAAATGCTAGTTAAAAATTTATTGTTATTATCTTTTTAAATGGAGTTCCGGCAAGTGGT -541 
TACTCCCCCGAGCTTTAACTTTGAAAGAATAACTTAACACGTGAATGGACATAGTGTTCG -481 
AGGGACCCCTCGTATCCCACAAAGTAAGAGTTGGATAAAATTACAACTTGTTTCCATTGG -421 
TrTAGACTGGCCAAGGCCATTACACGCTCAAGCCAGAGATTGAACTTTGAACTTCCACCT -36 1 
GAAGAGAGCAAAGACTCAACCACTAGAACAACGAAATGTGAACTTAAAAATTTATTATTG -301 

GTTGTTTTGAAAAAMTTC7TTATTACAATCTTAATTTTCATTTGATTGCCCAATTTTTA -241 
GAAAGGCAGTCACGGAGACCAAACGCTAAGTCCACTGCTCGTTTGCAAAGGAGAACGTTT -181 

GTCTTTATCGAGTATTATATACAGAAACGCTAAGTTGCGCCACATGTCATGAAATAAGCG -121 
TCCGAGATCCAAAGACGTTAACTTTCCCTACTA |fATAAAlfc GGGAAGCTTGAGACCATCT -61 
GCCCAGTCCGmTTajTCTAGCGAmCCTTTCGAAGTGTCTGTnACTCCTAGAAGCG -1 

ATGGAGTACTTTACACCGCGATATCrTCTCCTTCAAGGTTTTCTCATCGTCTTGATCGCA 60 
MEYFTPRYLLLQGFLIVLIA 
GCATCGAATGCCAGCGCTCAAgtaagtctttctgatttccagtttcatatgttaaatctc 120 

A S N A S A Q 

tctctctctctctctctctctctctctctctctcccaattctggtttgtgatgttgtatc 180 

atcgttctcgtctgcaTGCGATTTCTCATACTCGTGCAAGGATGTGACTGTAACCGGCAA 240 

CDFSYSCKDVTVTGN 
CTTCCTAGCCGCAGATTGCCTCAACAGTGATGGTGCATACGATCGGTCTTCTCTGAATAT 300 

FLAADCLNSDGAYDRSSLNM 
GAA(XjACATGATTGGTAATAGTAATGGAAGG(TrGTATTTCCCGGTACCTCCTTCCGTAA 360 

NDMIGNSNGRLVFPGTSFRN 
TTCATGCTTGAGTGTGGAGATCAACGACGGTCATACGCTCACAGCTTCXJTGCAAAGGCAC 420 

SCLSVEINDGHTLTASCKGT 
CGACGGGCAATATCAIXCTACAAGCCTTCATCTCAACTCTTGCGTTTATAATGCTGACGG 480 

DGQYHPTSLDLNSCVYNADG 
GGTGTTGGATTTCTGTGGCTATGGTGTCXXiAAAATCAACAGCCTACGTCAAGTCCAGTAC 540 

VLDFCGYGVGKSTAYVKSST 
CGTgtaatgtccctgcaacaagtactgccgtagttatattatatatcgttacttcacct 600 

V 

ctcaggaaatgctttactttgcgctctacaacactagctattctacttatatatggcatt 660 

tgc tggacgt t gtt tatattaat11 tc 11 tctc 11cac AAGCGAAGAAGCAAGCAGCGGA 720 

S E E A S S G 

TGATTTGAGCATGTGCTTCACTCTCTXK^AAACTrCCTCT ATTATGCTGTGACGTTGTCTA 780 

TCG AAGCCACACATCGGATT AT ATAT AACAT AGCGTCTTITCATTTGGGTG ATATGCTCC 840 

TCGCGTTCTCTTTCCCTGCTCTTTCTTTGTCTTTGTTATGtXXM^CTCAAGTTCTGCATGT 900 

ATGATTTTATTCCTACGTGCTG TATTGATATCAGTAGATGTGTCCTATATTCATTTCACC 960 

CTTCATTAATTAAGTATGTCAATGTCAACAGAAGTTACTCTC TAATAAAA TCCATACGAA 1020 

ATCTCCTGC ATTCAG AAACCC AC AAAT AGAG AATTTCT AAC AATGTC AGTTTATTACGTC 1080 

T ATTT ATTC AT AACCC AC ATCT AACT AACGCG AGTT AC ACT AT AGGAAATCTCTTTCTAC 1140 

TATTTTGGATTGTTAGTTT ATTACGTCTATTT ATT 1175 


Fig. 4. Nucleotide and deduced protein sequences of CtCVNH gene. The predicted amino acid 
sequence is shown below its open reading frame. The predicted promoter sequence is shown in 
shaded box (transcription start site shown in larger font). The TATA box is boxed with solid lines. 
The polvadenylation signal is underlined and boldface. The ATTT regions of the 3'UTR are 
underlined. The introns are present in lowercase letters. 
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Table 3. Component analysis of amino acid sequence of CtCVNH. 


Amino acid 

Number 

Frequency!% ! 

Hydrophobic amino acid 

38 

25.3 

Charged amino acid 

23 

15.3 

Polar amino acid 

96 

64.0 

Acidic amino acid 

15 

10.0 

Basic amino acid 

8 

5.3 

Ala(A) 

11 

7.3 

Gly(G) 

14 

9.3 

Met(M) 

3 

2.0 

Ser(S) 

20 

13.3 

Cysf 

7 

4.7 

His(H) 

2 

1.3 

Asn(N) 

11 

7,3 

Thr(T) 

10 

6.7 

AspCD) 

11 

7.3 

Ile(I) 

4 

2.7 

Pro(P) 

3 

2.0 

Val(V) 

10 

6.7 

Glu(E) 

4 

2.7 

Lys(K) 

4 

2.7 

Gln(Q) 

3 

2.0 

Trp(W) 

0 

0.0 

Phe(F) 

7 

4.7 

Leu(L) 

14 

9.3 

Tyr(Y) 

8 

5.3 

Arg(R) 

4 

2.7 


Amino acid sequence alignment and phylogenetic analysis. —Initial homol¬ 
ogy searches were conducted with the deduced CtCVNH amino acid sequence 
in the non-human, non-mouse EST database at the NCBI (National Center for 
Biotechnology Information, NIH, Bethesda) by using the tblastn program 
(Altschul et al. t 1997). A new member of CVNHs was uncovered from the plant 
Selaginella moellendorffii Hieron. by conducting these searches. The results 
(Table 4) showed that the CVNH members were present in fungi and plants (E 
value < 0.01). Above 70% of the members occurred in fungi. A comparison of 
the deduced CtCVNH against other CVNHs revealed that CtCVNH shares a 
high degree of similarity with the two CVNHs from C. richardii (99% and 53% 
identity, respectively), and a reduced level of similarity with the CVNHs from 
fungi (26—33%). Multiple sequence alignment indicated that the anti-HIV 
domain is conserved [Fig. 5(a)]. The most conservative sites were F4, L18, G27, 
L36, G41, N42, G45, F54, L69, G78, L87, N93, and G96 (the numbering is in 
line with the N. ellipsosporum CV-N). These residues are predominantly 
located in the hydrophobic core region, which are involved in hydrophobic 
interactions between the P-hairpin and the underlying triple-stranded P-sheet 
of each repeat (Percudani et al., 2005). Also conserved are hydrophilic amino 
acids involved in the formation of the hydrogen-bonded bridges that connect 
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Fig. 5(a). Multiple alignment of the CVNHs. Sequence conservation is visualized according to the 
ESPript chemical equivalence measure, with a similarity threshold set to emphasize strictly 
conserved positions. Invariant residues are boxed in black: physico-chemical equivalent residues 


are boxed in gray. The deduced secondary structure of CtCVNH is shown above the alignment. The 
predicted N-termini of the mature fern polypeptides are indicated by a black triangle. Fig. 5(b). 
Comparison of the amino acid sequences of domains 1-76 and 77-150 of CtCVNH. Sequence 


homology of the domains was maximized by insertion of gaps (-). Identical amino acids (•) and 
conserved amino acids I* j are indicated. 


P-strands 1—9 and 4—6 (Bewley et al., 1998). These suggest a critical structural 
role or their involvement in carbohydrate binding. 

Sequence similarity was also examined between the first (residues 1-50. 
according to the numbering in the N. eHipsosporum CV-N) and the second half 
(residues 51—101) of the CVNHs. Like CV-N, all CVNHs comprise two tandem 
sequence repeats with identities ranging from 24.0% to 41.1% (data not 
shown). Several residues are completely conserved [Fig. 5(b)]. The apparent 
sequence similarity between the two repeats (with an average identity of 
33.3%) can be ascribed to the structural constraints imposed by the 
symmetrically interconnected CVNH fold (Percudani et al., 2005). 

A neighbor-joining tree (Fig. 6) was constructed to analyze the phylogenetic 
relationships of CtCVNH with other CVNHs (Table 4). It shows that CtCVNH is 
closely related to the member from C. richardii (BQ087187), and CVNHs 
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Ceratopteris thalictroides 



JOPAM 

Fig. 6. Phytogeny of the CVNH proteins. The unrooted tree was constructed bv neighbor-joining 
analysis (Saitou and Nei, 1987) of genetic distances estimated with the Jones-Taylor-Thornton 
model. Branch lengths are proportional to genetic distances as indicated by the scale bar 
representing 10 PAMs (point-accepted mutations). 


belonging to different phyla form monophyletic groups. The CVNH domains 
may have common origin; however, Percudani et al. (2005) suggested that in 
fungi and seedless plants the domain has been separately amplified with 
different copy numbers following the separation of these two lineages. 

Predicted CtCVNH tertiary structure and the structural evolution of CVNHs/ 
CV-N. —Understanding the structural properties of CtCVNH is important for 
clarifying the conservation and variation of CVNHs as well as the roles they 
play in plants. In silico methods exist to predict with high reliability the 
tertiary structure of proteins from template structures (Saenz-Rivera et al., 
2004; Gopalasubramaniam et al., 2008). Predicting a structure can yield 
insights into potential evolutionary patterns for CVNHs. Because CtCVNH and 
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Fig. 7. Panel (a) overlay of predicted CtCVNH (gray) and native CV-N (black) tertiary structures. 
Panel (b) overlay of CrCVNH (gray) and native CV-N (black) tertiary structures, p-strands are 
indicated with pi—10 and helical turns with otl-4. Arrow shows the two sugar binding pockets 
of CV-N. 


C. richardii CVNH (CrCVNH) are approximately 50% identical, we predicted 
the tertiary structure of CtCVNH using CrCVNH as a template. Fig. 7a further 
shows that the predicted CtCVNH comprises two tandem sequence repeats. 
They form equivalent, elongated structures via the combination of a triple- 
stranded [3-sheet and a P-hairpin. Thus two symmetrically related fold- 
domains are created, each containing a sugar-binding site. Fig. 7a indicates 
that CtCVNH structure is quite similar to that of native CV-N, including the 
positions of triple-stranded antiparallel p-sheet (the first sequence repeat: pi, 
P2, and P3; the second: p6, P7, and P8), P-hairpin (formed by P4 and P5, p9 and 
P10, respectively), and ot-helical turn (ocl—4). However, the structures differ in 
that the N- and C-terminal regions are longer in CtCVNH than in CV-N, the 
helical turn (oc3) folds differently, and an (3/4 turn) ot-helix exists within the C- 
terminal region of predicted CtCVNH. Moreover, the pi and P6 strands are 






90 


AMERICAN FERN JOURNAL: VOLUME 99 NUMBER 2 (2009) 


shorter in CtCVNH than in CV-N. To further understand the CVNH evolution 
in plants, we also compared the tertiary structure of CrCVNH with CV-N. 
Fig. 7b shows that the native CrCVNH structure is more similar to that of 
native CV-N, and most differences exist in the helical turn regions (oc2, a3, a4) 
rather than in the (3-strand ones. It is worthwhile to note that these differences 
are located in the sugar binding pockets of the proteins, which imply that 

CrCVNH and CV-N may have different affinities for mannose disaccharide 
ligands iPercudani et al 2005). 

In conclusion, molecular cloning and characterization of CtCVNH showed 
that CtCVNH is very similar to other CVNHs from ascomycete fungi and the 
fern C. richardii, having a typical anti-HIV domain [Fig. 5(a), 7], indicating that 
CtCVNH belongs to CVNH family. This is the first time a full-length genomic 
DNA of CVNH in plants has been cloned. Our results provide a basis for a 
deeper understanding oi CVNH function and evolution. 
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